PPO-Clip actualiza la política con: \[
\theta_{k+1} = \underset{\theta}{\mathrm{arg \space max}} \space \underset{s,a \sim \pi_{\theta_{k}}}{\mathrm{E}} \space \left[ L(s, a, \theta_k, \theta) \right]
\]
Función objetivo: \[
L(s, a, \theta_k, \theta) = \min{\left( \frac{\pi_{\theta}(a|s)}{\pi_{\theta_{k}}(a|s)} A^{\pi_{\theta_{k}}}(s,a), \space\space g(\epsilon, A^{\pi_{\theta_{k}}}(s,a)) \right)}
\]
Donde: \[
g(\epsilon, A) = \left\{\begin{array}{cc}
(1 + \epsilon)A & \text { si } A \geq 0 \\
(1 - \epsilon)A & \text { si } A < 0
\end{array}\right.
\]